<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
  "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">


<html xmlns="http://www.w3.org/1999/xhtml">
  <head>
    <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
    
    <title>Chapter 11 Going 3D: The PDB module &mdash; Biopython_en 1.0 documentation</title>
    
    <link rel="stylesheet" href="_static/default.css" type="text/css" />
    <link rel="stylesheet" href="_static/pygments.css" type="text/css" />
    
    <script type="text/javascript">
      var DOCUMENTATION_OPTIONS = {
        URL_ROOT:    './',
        VERSION:     '1.0',
        COLLAPSE_INDEX: false,
        FILE_SUFFIX: '.html',
        HAS_SOURCE:  true
      };
    </script>
    <script type="text/javascript" src="_static/jquery.js"></script>
    <script type="text/javascript" src="_static/underscore.js"></script>
    <script type="text/javascript" src="_static/doctools.js"></script>
    <link rel="top" title="Biopython_en 1.0 documentation" href="index.html" />
    <link rel="next" title="Chapter 12 Bio.PopGen: Population genetics" href="chr12.html" />
    <link rel="prev" title="Chapter 10 Swiss-Prot and ExPASy" href="chr10.html" /> 
  </head>
  <body>
    <div class="related">
      <h3>Navigation</h3>
      <ul>
        <li class="right" style="margin-right: 10px">
          <a href="genindex.html" title="General Index"
             accesskey="I">index</a></li>
        <li class="right" >
          <a href="chr12.html" title="Chapter 12 Bio.PopGen: Population genetics"
             accesskey="N">next</a> |</li>
        <li class="right" >
          <a href="chr10.html" title="Chapter 10 Swiss-Prot and ExPASy"
             accesskey="P">previous</a> |</li>
        <li><a href="index.html">Biopython_en 1.0 documentation</a> &raquo;</li> 
      </ul>
    </div>  

    <div class="document">
      <div class="documentwrapper">
        <div class="bodywrapper">
          <div class="body">
            
  <div class="section" id="chapter-11-going-3d-the-pdb-module">
<h1>Chapter 11  Going 3D: The PDB module<a class="headerlink" href="#chapter-11-going-3d-the-pdb-module" title="Permalink to this headline">¶</a></h1>
<p>Bio.PDB is a Biopython module that focuses on working with crystal
structures of biological macromolecules. Among other things, Bio.PDB
includes a PDBParser class that produces a Structure object, which can
be used to access the atomic data in the file in a convenient manner.
There is limited support for parsing the information contained in the
PDB header.</p>
<div class="section" id="reading-and-writing-crystal-structure-files">
<h2>11.1  Reading and writing crystal structure files<a class="headerlink" href="#reading-and-writing-crystal-structure-files" title="Permalink to this headline">¶</a></h2>
<div class="section" id="reading-a-pdb-file">
<h3>11.1.1  Reading a PDB file<a class="headerlink" href="#reading-a-pdb-file" title="Permalink to this headline">¶</a></h3>
<p>First we create a <tt class="docutils literal"><span class="pre">PDBParser</span></tt> object:</p>
<p>The <tt class="docutils literal"><span class="pre">PERMISSIVE</span></tt> flag indicates that a number of common problems (see
<a class="reference external" href="#problem%20structures">11.7.1</a>) associated with PDB files will be
ignored (but note that some atoms and/or residues will be missing). If
the flag is not present a <tt class="docutils literal"><span class="pre">PDBConstructionException</span></tt> will be generated
if any problems are detected during the parse operation.</p>
<p>The Structure object is then produced by letting the <tt class="docutils literal"><span class="pre">PDBParser</span></tt>
object parse a PDB file (the PDB file in this case is called
’pdb1fat.ent’, ’1fat’ is a user defined name for the structure):</p>
<p>You can extract the header and trailer (simple lists of strings) of the
PDB file from the PDBParser object with the <tt class="docutils literal"><span class="pre">get_header</span></tt> and
<tt class="docutils literal"><span class="pre">get_trailer</span></tt> methods. Note however that many PDB files contain
headers with incomplete or erroneous information. Many of the errors
have been fixed in the equivalent mmCIF files. <em>Hence, if you are
interested in the header information, it is a good idea to extract
information from mmCIF files using the</em> <tt class="docutils literal"><span class="pre">MMCIF2Dict</span></tt> <em>tool described
below, instead of parsing the PDB header.</em></p>
<p>Now that is clarified, let’s return to parsing the PDB header. The
structure object has an attribute called <tt class="docutils literal"><span class="pre">header</span></tt> which is a Python
dictionary that maps header records to their values.</p>
<p>Example:</p>
<p>The available keys are <tt class="docutils literal"><span class="pre">name</span></tt>, <tt class="docutils literal"><span class="pre">head</span></tt>, <tt class="docutils literal"><span class="pre">deposition_date</span></tt>,
<tt class="docutils literal"><span class="pre">release_date</span></tt>, <tt class="docutils literal"><span class="pre">structure_method</span></tt>, <tt class="docutils literal"><span class="pre">resolution</span></tt>,
<tt class="docutils literal"><span class="pre">structure_reference</span></tt> (which maps to a list of references),
<tt class="docutils literal"><span class="pre">journal_reference</span></tt>, <tt class="docutils literal"><span class="pre">author</span></tt>, and <tt class="docutils literal"><span class="pre">compound</span></tt> (which maps to a
dictionary with various information about the crystallized compound).</p>
<p>The dictionary can also be created without creating a <tt class="docutils literal"><span class="pre">Structure</span></tt>
object, ie. directly from the PDB file:</p>
</div>
<div class="section" id="reading-an-mmcif-file">
<h3>11.1.2  Reading an mmCIF file<a class="headerlink" href="#reading-an-mmcif-file" title="Permalink to this headline">¶</a></h3>
<p>Similarly to the case the case of PDB files, first create an
<tt class="docutils literal"><span class="pre">MMCIFParser</span></tt> object:</p>
<p>Then use this parser to create a structure object from the mmCIF file:</p>
<p>To have some more low level access to an mmCIF file, you can use the
<tt class="docutils literal"><span class="pre">MMCIF2Dict</span></tt> class to create a Python dictionary that maps all mmCIF
tags in an mmCIF file to their values. If there are multiple values
(like in the case of tag <tt class="docutils literal"><span class="pre">_atom_site.Cartn_y</span></tt>, which holds the <em>y</em>
coordinates of all atoms), the tag is mapped to a list of values. The
dictionary is created from the mmCIF file as follows:</p>
<p>Example: get the solvent content from an mmCIF file:</p>
<p>Example: get the list of the <em>y</em> coordinates of all atoms</p>
</div>
<div class="section" id="reading-files-in-the-pdb-xml-format">
<h3>11.1.3  Reading files in the PDB XML format<a class="headerlink" href="#reading-files-in-the-pdb-xml-format" title="Permalink to this headline">¶</a></h3>
<p>That’s not yet supported, but we are definitely planning to support that
in the future (it’s not a lot of work). Contact the Biopython developers
(<a class="reference external" href="mailto:biopython-dev&#37;&#52;&#48;biopython&#46;org">biopython-dev<span>&#64;</span>biopython<span>&#46;</span>org</a>)
if you need this).</p>
</div>
<div class="section" id="writing-pdb-files">
<h3>11.1.4  Writing PDB files<a class="headerlink" href="#writing-pdb-files" title="Permalink to this headline">¶</a></h3>
<p>Use the PDBIO class for this. It’s easy to write out specific parts of a
structure too, of course.</p>
<p>Example: saving a structure</p>
<p>If you want to write out a part of the structure, make use of the
<tt class="docutils literal"><span class="pre">Select</span></tt> class (also in <tt class="docutils literal"><span class="pre">PDBIO</span></tt>). Select has four methods:</p>
<ul class="simple">
<li><tt class="docutils literal"><span class="pre">accept_model(model)</span></tt></li>
<li><tt class="docutils literal"><span class="pre">accept_chain(chain)</span></tt></li>
<li><tt class="docutils literal"><span class="pre">accept_residue(residue)</span></tt></li>
<li><tt class="docutils literal"><span class="pre">accept_atom(atom)</span></tt></li>
</ul>
<p>By default, every method returns 1 (which means the
model/chain/residue/atom is included in the output). By subclassing
<tt class="docutils literal"><span class="pre">Select</span></tt> and returning 0 when appropriate you can exclude models,
chains, etc. from the output. Cumbersome maybe, but very powerful. The
following code only writes out glycine residues:</p>
<p>If this is all too complicated for you, the <tt class="docutils literal"><span class="pre">Dice</span></tt> module contains a
handy <tt class="docutils literal"><span class="pre">extract</span></tt> function that writes out all residues in a chain
between a start and end residue.</p>
</div>
</div>
<div class="section" id="structure-representation">
<h2>11.2  Structure representation<a class="headerlink" href="#structure-representation" title="Permalink to this headline">¶</a></h2>
<p>The overall layout of a <tt class="docutils literal"><span class="pre">Structure</span></tt> object follows the so-called SMCRA
(Structure/Model/Chain/Residue/Atom) architecture:</p>
<ul class="simple">
<li>A structure consists of models</li>
<li>A model consists of chains</li>
<li>A chain consists of residues</li>
<li>A residue consists of atoms</li>
</ul>
<p>This is the way many structural biologists/bioinformaticians think about
structure, and provides a simple but efficient way to deal with
structure. Additional stuff is essentially added when needed. A UML
diagram of the <tt class="docutils literal"><span class="pre">Structure</span></tt> object (forget about the <tt class="docutils literal"><span class="pre">Disordered</span></tt>
classes for now) is shown in Fig. <a class="reference external" href="#fig:smcra">11.1</a>. Such a data
structure is not necessarily best suited for the representation of the
macromolecular content of a structure, but it is absolutely necessary
for a good interpretation of the data present in a file that describes
the structure (typically a PDB or MMCIF file). If this hierarchy cannot
represent the contents of a structure file, it is fairly certain that
the file contains an error or at least does not describe the structure
unambiguously. If a SMCRA data structure cannot be generated, there is
reason to suspect a problem. Parsing a PDB file can thus be used to
detect likely problems. We will give several examples of this in section
<a class="reference external" href="#problem%20structures">11.7.1</a>.</p>
<blockquote>
<div><a href="#id3"><span class="problematic" id="id4">|image3|</span></a>
+&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;+
| Figure 11.1: UML diagram of SMCRA architecture of the <tt class="docutils literal"><span class="pre">Structure</span></tt> class used to represent a macromolecular structure. Full lines with diamonds denote aggregation, full lines with arrows denote referencing, full lines with triangles denote inheritance and dashed lines with triangles denote interface realization.   |
+&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;+</div></blockquote>
<p>Structure, Model, Chain and Residue are all subclasses of the Entity
base class. The Atom class only (partly) implements the Entity interface
(because an Atom does not have children).</p>
<p>For each Entity subclass, you can extract a child by using a unique id
for that child as a key (e.g. you can extract an Atom object from a
Residue object by using an atom name string as a key, you can extract a
Chain object from a Model object by using its chain identifier as a
key).</p>
<p>Disordered atoms and residues are represented by DisorderedAtom and
DisorderedResidue classes, which are both subclasses of the
DisorderedEntityWrapper base class. They hide the complexity associated
with disorder and behave exactly as Atom and Residue objects.</p>
<p>In general, a child Entity object (i.e. Atom, Residue, Chain, Model) can
be extracted from its parent (i.e. Residue, Chain, Model, Structure,
respectively) by using an id as a key.</p>
<p>You can also get a list of all child Entities of a parent Entity object.
Note that this list is sorted in a specific way (e.g. according to chain
identifier for Chain objects in a Model object).</p>
<p>You can also get the parent from a child:</p>
<p>At all levels of the SMCRA hierarchy, you can also extract a <em>full id</em>.
The full id is a tuple containing all id’s starting from the top object
(Structure) down to the current object. A full id for a Residue object
e.g. is something like:</p>
<p>This corresponds to:</p>
<ul class="simple">
<li>The Structure with id &#8220;1abc&#8221;</li>
<li>The Model with id 0</li>
<li>The Chain with id &#8220;A&#8221;</li>
<li>The Residue with id (&#8221; &#8221;, 10, &#8220;A&#8221;).</li>
</ul>
<p>The Residue id indicates that the residue is not a hetero-residue (nor a
water) because it has a blank hetero field, that its sequence identifier
is 10 and that its insertion code is &#8220;A&#8221;.</p>
<p>To get the entity’s id, use the <tt class="docutils literal"><span class="pre">get_id</span></tt> method:</p>
<p>You can check if the entity has a child with a given id by using the
<tt class="docutils literal"><span class="pre">has_id</span></tt> method:</p>
<p>The length of an entity is equal to its number of children:</p>
<p>It is possible to delete, rename, add, etc. child entities from a parent
entity, but this does not include any sanity checks (e.g. it is possible
to add two residues with the same id to one chain). This really should
be done via a nice Decorator class that includes integrity checking, but
you can take a look at the code (Entity.py) if you want to use the raw
interface.</p>
<div class="section" id="structure">
<h3>11.2.1  Structure<a class="headerlink" href="#structure" title="Permalink to this headline">¶</a></h3>
<p>The Structure object is at the top of the hierarchy. Its id is a user
given string. The Structure contains a number of Model children. Most
crystal structures (but not all) contain a single model, while NMR
structures typically consist of several models. Disorder in crystal
structures of large parts of molecules can also result in several
models.</p>
</div>
<div class="section" id="model">
<h3>11.2.2  Model<a class="headerlink" href="#model" title="Permalink to this headline">¶</a></h3>
<p>The id of the Model object is an integer, which is derived from the
position of the model in the parsed file (they are automatically
numbered starting from 0). Crystal structures generally have only one
model (with id 0), while NMR files usually have several models. Whereas
many PDB parsers assume that there is only one model, the <tt class="docutils literal"><span class="pre">Structure</span></tt>
class in <tt class="docutils literal"><span class="pre">Bio.PDB</span></tt> is designed such that it can easily handle PDB
files with more than one model.</p>
<p>As an example, to get the first model from a Structure object, use</p>
<p>The Model object stores a list of Chain children.</p>
</div>
<div class="section" id="chain">
<h3>11.2.3  Chain<a class="headerlink" href="#chain" title="Permalink to this headline">¶</a></h3>
<p>The id of a Chain object is derived from the chain identifier in the
PDB/mmCIF file, and is a single character (typically a letter). Each
Chain in a Model object has a unique id. As an example, to get the Chain
object with identifier “A” from a Model object, use</p>
<p>The Chain object stores a list of Residue children.</p>
</div>
<div class="section" id="residue">
<h3>11.2.4  Residue<a class="headerlink" href="#residue" title="Permalink to this headline">¶</a></h3>
<p>A residue id is a tuple with three elements:</p>
<ul>
<li><p class="first">The <strong>hetero-field</strong> (hetfield): this is</p>
<ul class="simple">
<li><tt class="docutils literal"><span class="pre">'W'</span></tt> in the case of a water molecule;</li>
<li><tt class="docutils literal"><span class="pre">'H_'</span></tt> followed by the residue name for other hetero residues
(e.g. <tt class="docutils literal"><span class="pre">'H_GLC'</span></tt> in the case of a glucose molecule);</li>
<li>blank for standard amino and nucleic acids.</li>
</ul>
<p>This scheme is adopted for reasons described in section
<a class="reference external" href="#hetero%20problems">11.4.1</a>.</p>
</li>
<li><p class="first">The <strong>sequence identifier</strong> (resseq), an integer describing the
position of the residue in the chain (e.g., 100);</p>
</li>
<li><p class="first">The <strong>insertion code</strong> (icode); a string, e.g. ’A’. The insertion
code is sometimes used to preserve a certain desirable residue
numbering scheme. A Ser 80 insertion mutant (inserted e.g. between a
Thr 80 and an Asn 81 residue) could e.g. have sequence identifiers
and insertion codes as follows: Thr 80 A, Ser 80 B, Asn 81. In this
way the residue numbering scheme stays in tune with that of the wild
type structure.</p>
</li>
</ul>
<p>The id of the above glucose residue would thus be
<tt class="docutils literal"><span class="pre">(’H_GLC’,</span> <span class="pre">100,</span> <span class="pre">’A’)</span></tt>. If the hetero-flag and insertion code are
blank, the sequence identifier alone can be used:</p>
<p>The reason for the hetero-flag is that many, many PDB files use the same
sequence identifier for an amino acid and a hetero-residue or a water,
which would create obvious problems if the hetero-flag was not used.</p>
<p>Unsurprisingly, a Residue object stores a set of Atom children. It also
contains a string that specifies the residue name (e.g. “ASN”) and the
segment identifier of the residue (well known to X-PLOR users, but not
used in the construction of the SMCRA data structure).</p>
<p>Let’s look at some examples. Asn 10 with a blank insertion code would
have residue id <tt class="docutils literal"><span class="pre">(’</span> <span class="pre">’,</span> <span class="pre">10,</span> <span class="pre">’</span> <span class="pre">’)</span></tt>. Water 10 would have residue id
<tt class="docutils literal"><span class="pre">(’W’,</span> <span class="pre">10,</span> <span class="pre">’</span> <span class="pre">’)</span></tt>. A glucose molecule (a hetero residue with residue
name GLC) with sequence identifier 10 would have residue id
<tt class="docutils literal"><span class="pre">(’H_GLC’,</span> <span class="pre">10,</span> <span class="pre">’</span> <span class="pre">’)</span></tt>. In this way, the three residues (with the same
insertion code and sequence identifier) can be part of the same chain
because their residue id’s are distinct.</p>
<p>In most cases, the hetflag and insertion code fields will be blank, e.g.
<tt class="docutils literal"><span class="pre">(’</span> <span class="pre">’,</span> <span class="pre">10,</span> <span class="pre">’</span> <span class="pre">’)</span></tt>. In these cases, the sequence identifier can be used
as a shortcut for the full id:</p>
<p>Each Residue object in a Chain object should have a unique id. However,
disordered residues are dealt with in a special way, as described in
section <a class="reference external" href="#point%20mutations">11.3.3</a>.</p>
<p>A Residue object has a number of additional methods:</p>
<p>You can use <tt class="docutils literal"><span class="pre">is_aa(residue)</span></tt> to test if a Residue object is an amino
acid.</p>
</div>
<div class="section" id="atom">
<h3>11.2.5  Atom<a class="headerlink" href="#atom" title="Permalink to this headline">¶</a></h3>
<p>The Atom object stores the data associated with an atom, and has no
children. The id of an atom is its atom name (e.g. “OG” for the side
chain oxygen of a Ser residue). An Atom id needs to be unique in a
Residue. Again, an exception is made for disordered atoms, as described
in section <a class="reference external" href="#disordered%20atoms">11.3.2</a>.</p>
<p>The atom id is simply the atom name (eg. <tt class="docutils literal"><span class="pre">’CA’</span></tt>). In practice, the
atom name is created by stripping all spaces from the atom name in the
PDB file.</p>
<p>However, in PDB files, a space can be part of an atom name. Often,
calcium atoms are called <tt class="docutils literal"><span class="pre">’CA..’</span></tt> in order to distinguish them from Cα
atoms (which are called <tt class="docutils literal"><span class="pre">’.CA.’</span></tt>). In cases were stripping the spaces
would create problems (ie. two atoms called <tt class="docutils literal"><span class="pre">’CA’</span></tt> in the same
residue) the spaces are kept.</p>
<p>In a PDB file, an atom name consists of 4 chars, typically with leading
and trailing spaces. Often these spaces can be removed for ease of use
(e.g. an amino acid C α atom is labeled “.CA.” in a PDB file, where the
dots represent spaces). To generate an atom name (and thus an atom id)
the spaces are removed, unless this would result in a name collision in
a Residue (i.e. two Atom objects with the same atom name and id). In the
latter case, the atom name including spaces is tried. This situation can
e.g. happen when one residue contains atoms with names “.CA.” and
“CA..”, although this is not very likely.</p>
<p>The atomic data stored includes the atom name, the atomic coordinates
(including standard deviation if present), the B factor (including
anisotropic B factors and standard deviation if present), the altloc
specifier and the full atom name including spaces. Less used items like
the atom element number or the atomic charge sometimes specified in a
PDB file are not stored.</p>
<p>To manipulate the atomic coordinates, use the <tt class="docutils literal"><span class="pre">transform</span></tt> method of
the <tt class="docutils literal"><span class="pre">Atom</span></tt> object. Use the <tt class="docutils literal"><span class="pre">set_coord</span></tt> method to specify the atomic
coordinates directly.</p>
<p>An Atom object has the following additional methods:</p>
<p>To represent the atom coordinates, siguij, anisotropic B factor and
sigatm Numpy arrays are used.</p>
<p>The <tt class="docutils literal"><span class="pre">get_vector</span></tt> method returns a <tt class="docutils literal"><span class="pre">Vector</span></tt> object representation of
the coordinates of the <tt class="docutils literal"><span class="pre">Atom</span></tt> object, allowing you to do vector
operations on atomic coordinates. <tt class="docutils literal"><span class="pre">Vector</span></tt> implements the full set of
3D vector operations, matrix multiplication (left and right) and some
advanced rotation-related operations as well.</p>
<p>As an example of the capabilities of Bio.PDB’s <tt class="docutils literal"><span class="pre">Vector</span></tt> module,
suppose that you would like to find the position of a Gly residue’s Cβ
atom, if it had one. Rotating the N atom of the Gly residue along the
Cα-C bond over -120 degrees roughly puts it in the position of a virtual
Cβ atom. Here’s how to do it, making use of the <tt class="docutils literal"><span class="pre">rotaxis</span></tt> method
(which can be used to construct a rotation around a certain axis) of the
<tt class="docutils literal"><span class="pre">Vector</span></tt> module:</p>
<p>This example shows that it’s possible to do some quite nontrivial vector
operations on atomic data, which can be quite useful. In addition to all
the usual vector operations (cross (use <tt class="docutils literal"><span class="pre">*</span></tt><tt class="docutils literal"><span class="pre">*</span></tt>), and dot (use
<tt class="docutils literal"><span class="pre">*</span></tt>) product, angle, norm, etc.) and the above mentioned <tt class="docutils literal"><span class="pre">rotaxis</span></tt>
function, the <tt class="docutils literal"><span class="pre">Vector</span></tt> module also has methods to rotate (<tt class="docutils literal"><span class="pre">rotmat</span></tt>)
or reflect (<tt class="docutils literal"><span class="pre">refmat</span></tt>) one vector on top of another.</p>
</div>
<div class="section" id="extracting-a-specific-atom-residue-chain-model-from-a-structure">
<h3>11.2.6  Extracting a specific <tt class="docutils literal"><span class="pre">Atom/Residue/Chain/Model</span></tt> from a Structure<a class="headerlink" href="#extracting-a-specific-atom-residue-chain-model-from-a-structure" title="Permalink to this headline">¶</a></h3>
<p>These are some examples:</p>
<p>Note that you can use a shortcut:</p>
</div>
</div>
<div class="section" id="disorder">
<h2>11.3  Disorder<a class="headerlink" href="#disorder" title="Permalink to this headline">¶</a></h2>
<p>Bio.PDB can handle both disordered atoms and point mutations (i.e. a Gly
and an Ala residue in the same position).</p>
<div class="section" id="general-approach">
<h3>11.3.1  General approach<a class="headerlink" href="#general-approach" title="Permalink to this headline">¶</a></h3>
<p>Disorder should be dealt with from two points of view: the atom and the
residue points of view. In general, we have tried to encapsulate all the
complexity that arises from disorder. If you just want to loop over all
Cα atoms, you do not care that some residues have a disordered side
chain. On the other hand it should also be possible to represent
disorder completely in the data structure. Therefore, disordered atoms
or residues are stored in special objects that behave as if there is no
disorder. This is done by only representing a subset of the disordered
atoms or residues. Which subset is picked (e.g. which of the two
disordered OG side chain atom positions of a Ser residue is used) can be
specified by the user.</p>
</div>
<div class="section" id="disordered-atoms">
<h3>11.3.2  Disordered atoms<a class="headerlink" href="#disordered-atoms" title="Permalink to this headline">¶</a></h3>
<p>Disordered atoms are represented by ordinary <tt class="docutils literal"><span class="pre">Atom</span></tt> objects, but all
<tt class="docutils literal"><span class="pre">Atom</span></tt> objects that represent the same physical atom are stored in a
<tt class="docutils literal"><span class="pre">DisorderedAtom</span></tt> object (see Fig. <a class="reference external" href="#fig:smcra">11.1</a>). Each
<tt class="docutils literal"><span class="pre">Atom</span></tt> object in a <tt class="docutils literal"><span class="pre">DisorderedAtom</span></tt> object can be uniquely indexed
using its altloc specifier. The <tt class="docutils literal"><span class="pre">DisorderedAtom</span></tt> object forwards all
uncaught method calls to the selected Atom object, by default the one
that represents the atom with the highest occupancy. The user can of
course change the selected <tt class="docutils literal"><span class="pre">Atom</span></tt> object, making use of its altloc
specifier. In this way atom disorder is represented correctly without
much additional complexity. In other words, if you are not interested in
atom disorder, you will not be bothered by it.</p>
<p>Each disordered atom has a characteristic altloc identifier. You can
specify that a <tt class="docutils literal"><span class="pre">DisorderedAtom</span></tt> object should behave like the <tt class="docutils literal"><span class="pre">Atom</span></tt>
object associated with a specific altloc identifier:</p>
</div>
<div class="section" id="disordered-residues">
<h3>11.3.3  Disordered residues<a class="headerlink" href="#disordered-residues" title="Permalink to this headline">¶</a></h3>
<div class="section" id="common-case">
<h4>Common case<a class="headerlink" href="#common-case" title="Permalink to this headline">¶</a></h4>
<p>The most common case is a residue that contains one or more disordered
atoms. This is evidently solved by using DisorderedAtom objects to
represent the disordered atoms, and storing the DisorderedAtom object in
a Residue object just like ordinary Atom objects. The DisorderedAtom
will behave exactly like an ordinary atom (in fact the atom with the
highest occupancy) by forwarding all uncaught method calls to one of the
Atom objects (the selected Atom object) it contains.</p>
</div>
<div class="section" id="point-mutations">
<h4>Point mutations<a class="headerlink" href="#point-mutations" title="Permalink to this headline">¶</a></h4>
<p>A special case arises when disorder is due to a point mutation, i.e.
when two or more point mutants of a polypeptide are present in the
crystal. An example of this can be found in PDB structure 1EN2.</p>
<p>Since these residues belong to a different residue type (e.g. let’s say
Ser 60 and Cys 60) they should not be stored in a single <tt class="docutils literal"><span class="pre">Residue</span></tt>
object as in the common case. In this case, each residue is represented
by one <tt class="docutils literal"><span class="pre">Residue</span></tt> object, and both <tt class="docutils literal"><span class="pre">Residue</span></tt> objects are stored in a
single <tt class="docutils literal"><span class="pre">DisorderedResidue</span></tt> object (see Fig. <a class="reference external" href="#fig:smcra">11.1</a>).</p>
<p>The <tt class="docutils literal"><span class="pre">DisorderedResidue</span></tt> object forwards all uncaught methods to the
selected <tt class="docutils literal"><span class="pre">Residue</span></tt> object (by default the last <tt class="docutils literal"><span class="pre">Residue</span></tt> object
added), and thus behaves like an ordinary residue. Each <tt class="docutils literal"><span class="pre">Residue</span></tt>
object in a <tt class="docutils literal"><span class="pre">DisorderedResidue</span></tt> object can be uniquely identified by
its residue name. In the above example, residue Ser 60 would have id
“SER” in the <tt class="docutils literal"><span class="pre">DisorderedResidue</span></tt> object, while residue Cys 60 would
have id “CYS”. The user can select the active <tt class="docutils literal"><span class="pre">Residue</span></tt> object in a
<tt class="docutils literal"><span class="pre">DisorderedResidue</span></tt> object via this id.</p>
<p>Example: suppose that a chain has a point mutation at position 10,
consisting of a Ser and a Cys residue. Make sure that residue 10 of this
chain behaves as the Cys residue.</p>
<p>In addition, you can get a list of all <tt class="docutils literal"><span class="pre">Atom</span></tt> objects (ie. all
<tt class="docutils literal"><span class="pre">DisorderedAtom</span></tt> objects are ’unpacked’ to their individual <tt class="docutils literal"><span class="pre">Atom</span></tt>
objects) using the <tt class="docutils literal"><span class="pre">get_unpacked_list</span></tt> method of a
<tt class="docutils literal"><span class="pre">(Disordered)Residue</span></tt> object.</p>
</div>
</div>
</div>
<div class="section" id="hetero-residues">
<h2>11.4  Hetero residues<a class="headerlink" href="#hetero-residues" title="Permalink to this headline">¶</a></h2>
<div class="section" id="associated-problems">
<h3>11.4.1  Associated problems<a class="headerlink" href="#associated-problems" title="Permalink to this headline">¶</a></h3>
<p>A common problem with hetero residues is that several hetero and
non-hetero residues present in the same chain share the same sequence
identifier (and insertion code). Therefore, to generate a unique id for
each hetero residue, waters and other hetero residues are treated in a
different way.</p>
<p>Remember that Residue object have the tuple (hetfield, resseq, icode) as
id. The hetfield is blank (“ ”) for amino and nucleic acids, and a
string for waters and other hetero residues. The content of the hetfield
is explained below.</p>
</div>
<div class="section" id="water-residues">
<h3>11.4.2  Water residues<a class="headerlink" href="#water-residues" title="Permalink to this headline">¶</a></h3>
<p>The hetfield string of a water residue consists of the letter “W”. So a
typical residue id for a water is (“W”, 1, “ ”).</p>
</div>
<div class="section" id="other-hetero-residues">
<h3>11.4.3  Other hetero residues<a class="headerlink" href="#other-hetero-residues" title="Permalink to this headline">¶</a></h3>
<p>The hetfield string for other hetero residues starts with “H_” followed
by the residue name. A glucose molecule e.g. with residue name “GLC”
would have hetfield “H_GLC”. Its residue id could e.g. be (“H_GLC”, 1,
“ ”).</p>
</div>
</div>
<div class="section" id="navigating-through-a-structure-object">
<h2>11.5  Navigating through a Structure object<a class="headerlink" href="#navigating-through-a-structure-object" title="Permalink to this headline">¶</a></h2>
<p>There is a shortcut if you want to iterate over all atoms in a
structure:</p>
<p>Similarly, to iterate over all atoms in a chain, use</p>
<p>or if you want to iterate over all residues in a model:</p>
<p>You can also use the <tt class="docutils literal"><span class="pre">Selection.unfold_entities</span></tt> function to get all
residues from a structure:</p>
<p>or to get all atoms from a chain:</p>
<p>Obviously, <tt class="docutils literal"><span class="pre">A=atom,</span> <span class="pre">R=residue,</span> <span class="pre">C=chain,</span> <span class="pre">M=model,</span> <span class="pre">S=structure</span></tt>. You can
use this to go up in the hierarchy, e.g. to get a list of (unique)
<tt class="docutils literal"><span class="pre">Residue</span></tt> or <tt class="docutils literal"><span class="pre">Chain</span></tt> parents from a list of <tt class="docutils literal"><span class="pre">Atoms</span></tt>:</p>
<p>For more info, see the API documentation.</p>
<p>This will make sure that the SMCRA data structure will behave as if only
the atoms with altloc A are present.</p>
<p>To extract polypeptides from a structure, construct a list of
<tt class="docutils literal"><span class="pre">Polypeptide</span></tt> objects from a <tt class="docutils literal"><span class="pre">Structure</span></tt> object using
<tt class="docutils literal"><span class="pre">PolypeptideBuilder</span></tt> as follows:</p>
<p>A Polypeptide object is simply a UserList of Residue objects, and is
always created from a single Model (in this case model 1). You can use
the resulting <tt class="docutils literal"><span class="pre">Polypeptide</span></tt> object to get the sequence as a <tt class="docutils literal"><span class="pre">Seq</span></tt>
object or to get a list of Cα atoms as well. Polypeptides can be built
using a C-N or a Cα-Cα distance criterion.</p>
<p>Example:</p>
<p>Note that in the above case only model 0 of the structure is considered
by <tt class="docutils literal"><span class="pre">PolypeptideBuilder</span></tt>. However, it is possible to use
<tt class="docutils literal"><span class="pre">PolypeptideBuilder</span></tt> to build <tt class="docutils literal"><span class="pre">Polypeptide</span></tt> objects from <tt class="docutils literal"><span class="pre">Model</span></tt>
and <tt class="docutils literal"><span class="pre">Chain</span></tt> objects as well.</p>
<p>The first thing to do is to extract all polypeptides from the structure
(as above). The sequence of each polypeptide can then easily be obtained
from the <tt class="docutils literal"><span class="pre">Polypeptide</span></tt> objects. The sequence is represented as a
Biopython <tt class="docutils literal"><span class="pre">Seq</span></tt> object, and its alphabet is defined by a
<tt class="docutils literal"><span class="pre">ProteinAlphabet</span></tt> object.</p>
<p>Example:</p>
</div>
<div class="section" id="analyzing-structures">
<h2>11.6  Analyzing structures<a class="headerlink" href="#analyzing-structures" title="Permalink to this headline">¶</a></h2>
<div class="section" id="measuring-distances">
<h3>11.6.1  Measuring distances<a class="headerlink" href="#measuring-distances" title="Permalink to this headline">¶</a></h3>
<p>The minus operator for atoms has been overloaded to return the distance
between two atoms.</p>
</div>
<div class="section" id="measuring-angles">
<h3>11.6.2  Measuring angles<a class="headerlink" href="#measuring-angles" title="Permalink to this headline">¶</a></h3>
<p>Use the vector representation of the atomic coordinates, and the
<tt class="docutils literal"><span class="pre">calc_angle</span></tt> function from the <tt class="docutils literal"><span class="pre">Vector</span></tt> module:</p>
</div>
<div class="section" id="measuring-torsion-angles">
<h3>11.6.3  Measuring torsion angles<a class="headerlink" href="#measuring-torsion-angles" title="Permalink to this headline">¶</a></h3>
<p>Use the vector representation of the atomic coordinates, and the
<tt class="docutils literal"><span class="pre">calc_dihedral</span></tt> function from the <tt class="docutils literal"><span class="pre">Vector</span></tt> module:</p>
</div>
<div class="section" id="determining-atom-atom-contacts">
<h3>11.6.4  Determining atom-atom contacts<a class="headerlink" href="#determining-atom-atom-contacts" title="Permalink to this headline">¶</a></h3>
<p>Use <tt class="docutils literal"><span class="pre">NeighborSearch</span></tt> to perform neighbor lookup. The neighbor lookup
is done using a KD tree module written in C (see <tt class="docutils literal"><span class="pre">Bio.KDTree</span></tt>), making
it very fast. It also includes a fast method to find all point pairs
within a certain distance of each other.</p>
</div>
<div class="section" id="superimposing-two-structures">
<h3>11.6.5  Superimposing two structures<a class="headerlink" href="#superimposing-two-structures" title="Permalink to this headline">¶</a></h3>
<p>Use a <tt class="docutils literal"><span class="pre">Superimposer</span></tt> object to superimpose two coordinate sets. This
object calculates the rotation and translation matrix that rotates two
lists of atoms on top of each other in such a way that their RMSD is
minimized. Of course, the two lists need to contain the same number of
atoms. The <tt class="docutils literal"><span class="pre">Superimposer</span></tt> object can also apply the
rotation/translation to a list of atoms. The rotation and translation
are stored as a tuple in the <tt class="docutils literal"><span class="pre">rotran</span></tt> attribute of the
<tt class="docutils literal"><span class="pre">Superimposer</span></tt> object (note that the rotation is right multiplying!).
The RMSD is stored in the <tt class="docutils literal"><span class="pre">rmsd</span></tt> attribute.</p>
<p>The algorithm used by <tt class="docutils literal"><span class="pre">Superimposer</span></tt> comes from [<a class="reference external" href="#golub1989">17</a>,
Golub &amp; Van Loan] and makes use of singular value decomposition (this is
implemented in the general <tt class="docutils literal"><span class="pre">Bio.SVDSuperimposer</span></tt> module).</p>
<p>Example:</p>
<p>To superimpose two structures based on their active sites, use the
active site atoms to calculate the rotation/translation matrices (as
above), and apply these to the whole molecule.</p>
</div>
<div class="section" id="mapping-the-residues-of-two-related-structures-onto-each-other">
<h3>11.6.6  Mapping the residues of two related structures onto each other<a class="headerlink" href="#mapping-the-residues-of-two-related-structures-onto-each-other" title="Permalink to this headline">¶</a></h3>
<p>First, create an alignment file in FASTA format, then use the
<tt class="docutils literal"><span class="pre">StructureAlignment</span></tt> class. This class can also be used for alignments
with more than two structures.</p>
</div>
<div class="section" id="calculating-the-half-sphere-exposure">
<h3>11.6.7  Calculating the Half Sphere Exposure<a class="headerlink" href="#calculating-the-half-sphere-exposure" title="Permalink to this headline">¶</a></h3>
<p>Half Sphere Exposure (HSE) is a new, 2D measure of solvent exposure
[<a class="reference external" href="#hamelryck2005">20</a>]. Basically, it counts the number of Cα atoms
around a residue in the direction of its side chain, and in the opposite
direction (within a radius of 13 Å). Despite its simplicity, it
outperforms many other measures of solvent exposure.</p>
<p>HSE comes in two flavors: HSEα and HSEβ. The former only uses the Cα
atom positions, while the latter uses the Cα and Cβ atom positions. The
HSE measure is calculated by the <tt class="docutils literal"><span class="pre">HSExposure</span></tt> class, which can also
calculate the contact number. The latter class has methods which return
dictionaries that map a <tt class="docutils literal"><span class="pre">Residue</span></tt> object to its corresponding HSEα,
HSEβ and contact number values.</p>
<p>Example:</p>
</div>
<div class="section" id="determining-the-secondary-structure">
<h3>11.6.8  Determining the secondary structure<a class="headerlink" href="#determining-the-secondary-structure" title="Permalink to this headline">¶</a></h3>
<p>For this functionality, you need to install DSSP (and obtain a license
for it — free for academic use, see
<tt class="docutils literal"><span class="pre">`http://www.cmbi.kun.nl/gv/dssp/</span></tt> &lt;<a class="reference external" href="http://www.cmbi.kun.nl/gv/dssp/">http://www.cmbi.kun.nl/gv/dssp/</a>&gt;`__).
Then use the <tt class="docutils literal"><span class="pre">DSSP</span></tt> class, which maps <tt class="docutils literal"><span class="pre">Residue</span></tt> objects to their
secondary structure (and accessible surface area). The DSSP codes are
listed in Table <a class="reference external" href="#cap:DSSP-codes">11.1</a>. Note that DSSP (the program,
and thus by consequence the class) cannot handle multiple models!</p>
<blockquote>
<div><table border="1" class="docutils">
<colgroup>
<col width="22%" />
<col width="78%" />
</colgroup>
<tbody valign="top">
<tr class="row-odd"><td>Code</td>
<td>Secondary structure</td>
</tr>
<tr class="row-even"><td>H</td>
<td>α-helix</td>
</tr>
<tr class="row-odd"><td>B</td>
<td>Isolated β-bridge residue</td>
</tr>
<tr class="row-even"><td>E</td>
<td>Strand</td>
</tr>
<tr class="row-odd"><td>G</td>
<td>3-10 helix</td>
</tr>
<tr class="row-even"><td>I</td>
<td>Π-helix</td>
</tr>
<tr class="row-odd"><td>T</td>
<td>Turn</td>
</tr>
<tr class="row-even"><td>S</td>
<td>Bend</td>
</tr>
<tr class="row-odd"><td><ul class="first last simple">
<li></li>
</ul>
</td>
<td>Other</td>
</tr>
</tbody>
</table>
<table border="1" class="docutils">
<colgroup>
<col width="100%" />
</colgroup>
<tbody valign="top">
<tr class="row-odd"><td>Table 11.1: DSSP codes in Bio.PDB.</td>
</tr>
</tbody>
</table>
</div></blockquote>
<p>The <tt class="docutils literal"><span class="pre">DSSP</span></tt> class can also be used to calculate the accessible surface
area of a residue. But see also section
<a class="reference external" href="#subsec:residue_depth">11.6.9</a>.</p>
</div>
<div class="section" id="calculating-the-residue-depth">
<h3>11.6.9  Calculating the residue depth<a class="headerlink" href="#calculating-the-residue-depth" title="Permalink to this headline">¶</a></h3>
<p>Residue depth is the average distance of a residue’s atoms from the
solvent accessible surface. It’s a fairly new and very powerful
parameterization of solvent accessibility. For this functionality, you
need to install Michel Sanner’s MSMS program
(<tt class="docutils literal"><span class="pre">`http://www.scripps.edu/pub/olson-web/people/sanner/html/msms_home.html</span></tt> &lt;<a class="reference external" href="http://www.scripps.edu/pub/olson-web/people/sanner/html/msms_home.html">http://www.scripps.edu/pub/olson-web/people/sanner/html/msms_home.html</a>&gt;`__).
Then use the <tt class="docutils literal"><span class="pre">ResidueDepth</span></tt> class. This class behaves as a dictionary
which maps <tt class="docutils literal"><span class="pre">Residue</span></tt> objects to corresponding (residue depth, Cα
depth) tuples. The Cα depth is the distance of a residue’s Cα atom to
the solvent accessible surface.</p>
<p>Example:</p>
<p>You can also get access to the molecular surface itself (via the
<tt class="docutils literal"><span class="pre">get_surface</span></tt> function), in the form of a Numeric Python array with
the surface points.</p>
</div>
</div>
<div class="section" id="common-problems-in-pdb-files">
<h2>11.7  Common problems in PDB files<a class="headerlink" href="#common-problems-in-pdb-files" title="Permalink to this headline">¶</a></h2>
<p>It is well known that many PDB files contain semantic errors (not the
structures themselves, but their representation in PDB files). Bio.PDB
tries to handle this in two ways. The PDBParser object can behave in two
ways: a restrictive way and a permissive way, which is the default.</p>
<p>Example:</p>
<p>In the permissive state (DEFAULT), PDB files that obviously contain
errors are “corrected” (i.e. some residues or atoms are left out). These
errors include:</p>
<ul class="simple">
<li>Multiple residues with the same identifier</li>
<li>Multiple atoms with the same identifier (taking into account the
altloc identifier)</li>
</ul>
<p>These errors indicate real problems in the PDB file (for details see
[<a class="reference external" href="#hamelryck2003a">18</a>, Hamelryck and Manderick, 2003]). In the
restrictive state, PDB files with errors cause an exception to occur.
This is useful to find errors in PDB files.</p>
<p>Some errors however are automatically corrected. Normally each
disordered atom should have a non-blank altloc identifier. However,
there are many structures that do not follow this convention, and have a
blank and a non-blank identifier for two disordered positions of the
same atom. This is automatically interpreted in the right way.</p>
<p>Sometimes a structure contains a list of residues belonging to chain A,
followed by residues belonging to chain B, and again followed by
residues belonging to chain A, i.e. the chains are ’broken’. This is
also correctly interpreted.</p>
<div class="section" id="examples">
<h3>11.7.1  Examples<a class="headerlink" href="#examples" title="Permalink to this headline">¶</a></h3>
<p>The PDBParser/Structure class was tested on about 800 structures (each
belonging to a unique SCOP superfamily). This takes about 20 minutes, or
on average 1.5 seconds per structure. Parsing the structure of the large
ribosomal subunit (1FKK), which contains about 64000 atoms, takes 10
seconds on a 1000 MHz PC.</p>
<p>Three exceptions were generated in cases where an unambiguous data
structure could not be built. In all three cases, the likely cause is an
error in the PDB file that should be corrected. Generating an exception
in these cases is much better than running the chance of incorrectly
describing the structure in a data structure.</p>
<div class="section" id="duplicate-residues">
<h4>11.7.1.1  Duplicate residues<a class="headerlink" href="#duplicate-residues" title="Permalink to this headline">¶</a></h4>
<p>One structure contains two amino acid residues in one chain with the
same sequence identifier (resseq 3) and icode. Upon inspection it was
found that this chain contains the residues Thr A3, …, Gly A202, Leu A3,
Glu A204. Clearly, Leu A3 should be Leu A203. A couple of similar
situations exist for structure 1FFK (which e.g. contains Gly B64, Met
B65, Glu B65, Thr B67, i.e. residue Glu B65 should be Glu B66).</p>
</div>
<div class="section" id="duplicate-atoms">
<h4>11.7.1.2  Duplicate atoms<a class="headerlink" href="#duplicate-atoms" title="Permalink to this headline">¶</a></h4>
<p>Structure 1EJG contains a Ser/Pro point mutation in chain A at position
22. In turn, Ser 22 contains some disordered atoms. As expected, all
atoms belonging to Ser 22 have a non-blank altloc specifier (B or C).
All atoms of Pro 22 have altloc A, except the N atom which has a blank
altloc. This generates an exception, because all atoms belonging to two
residues at a point mutation should have non-blank altloc. It turns out
that this atom is probably shared by Ser and Pro 22, as Ser 22 misses
the N atom. Again, this points to a problem in the file: the N atom
should be present in both the Ser and the Pro residue, in both cases
associated with a suitable altloc identifier.</p>
</div>
</div>
<div class="section" id="automatic-correction">
<h3>11.7.2  Automatic correction<a class="headerlink" href="#automatic-correction" title="Permalink to this headline">¶</a></h3>
<p>Some errors are quite common and can be easily corrected without much
risk of making a wrong interpretation. These cases are listed below.</p>
<div class="section" id="a-blank-altloc-for-a-disordered-atom">
<h4>11.7.2.1  A blank altloc for a disordered atom<a class="headerlink" href="#a-blank-altloc-for-a-disordered-atom" title="Permalink to this headline">¶</a></h4>
<p>Normally each disordered atom should have a non-blank altloc identifier.
However, there are many structures that do not follow this convention,
and have a blank and a non-blank identifier for two disordered positions
of the same atom. This is automatically interpreted in the right way.</p>
</div>
<div class="section" id="broken-chains">
<h4>11.7.2.2  Broken chains<a class="headerlink" href="#broken-chains" title="Permalink to this headline">¶</a></h4>
<p>Sometimes a structure contains a list of residues belonging to chain A,
followed by residues belonging to chain B, and again followed by
residues belonging to chain A, i.e. the chains are “broken”. This is
correctly interpreted.</p>
</div>
</div>
<div class="section" id="fatal-errors">
<h3>11.7.3  Fatal errors<a class="headerlink" href="#fatal-errors" title="Permalink to this headline">¶</a></h3>
<p>Sometimes a PDB file cannot be unambiguously interpreted. Rather than
guessing and risking a mistake, an exception is generated, and the user
is expected to correct the PDB file. These cases are listed below.</p>
<div class="section" id="id1">
<h4>11.7.3.1  Duplicate residues<a class="headerlink" href="#id1" title="Permalink to this headline">¶</a></h4>
<p>All residues in a chain should have a unique id. This id is generated
based on:</p>
<ul class="simple">
<li>The sequence identifier (resseq).</li>
<li>The insertion code (icode).</li>
<li>The hetfield string (“W” for waters and “H_” followed by the residue
name for other hetero residues)</li>
<li>The residue names of the residues in the case of point mutations (to
store the Residue objects in a DisorderedResidue object).</li>
</ul>
<p>If this does not lead to a unique id something is quite likely wrong,
and an exception is generated.</p>
</div>
<div class="section" id="id2">
<h4>11.7.3.2  Duplicate atoms<a class="headerlink" href="#id2" title="Permalink to this headline">¶</a></h4>
<p>All atoms in a residue should have a unique id. This id is generated
based on:</p>
<ul class="simple">
<li>The atom name (without spaces, or with spaces if a problem arises).</li>
<li>The altloc specifier.</li>
</ul>
<p>If this does not lead to a unique id something is quite likely wrong,
and an exception is generated.</p>
</div>
</div>
</div>
<div class="section" id="accessing-the-protein-data-bank">
<h2>11.8  Accessing the Protein Data Bank<a class="headerlink" href="#accessing-the-protein-data-bank" title="Permalink to this headline">¶</a></h2>
<div class="section" id="downloading-structures-from-the-protein-data-bank">
<h3>11.8.1  Downloading structures from the Protein Data Bank<a class="headerlink" href="#downloading-structures-from-the-protein-data-bank" title="Permalink to this headline">¶</a></h3>
<p>Structures can be downloaded from the PDB (Protein Data Bank) by using
the <tt class="docutils literal"><span class="pre">retrieve_pdb_file</span></tt> method on a <tt class="docutils literal"><span class="pre">PDBList</span></tt> object. The argument
for this method is the PDB identifier of the structure.</p>
<p>The <tt class="docutils literal"><span class="pre">PDBList</span></tt> class can also be used as a command-line tool:</p>
<p>The downloaded file will be called <tt class="docutils literal"><span class="pre">pdb1fat.ent</span></tt> and stored in the
current working directory. Note that the <tt class="docutils literal"><span class="pre">retrieve_pdb_file</span></tt> method
also has an optional argument <tt class="docutils literal"><span class="pre">pdir</span></tt> that specifies a specific
directory in which to store the downloaded PDB files.</p>
<p>The <tt class="docutils literal"><span class="pre">retrieve_pdb_file</span></tt> method also has some options to specify the
compression format used for the download, and the program used for local
decompression (default <tt class="docutils literal"><span class="pre">.Z</span></tt> format and <tt class="docutils literal"><span class="pre">gunzip</span></tt>). In addition, the
PDB ftp site can be specified upon creation of the <tt class="docutils literal"><span class="pre">PDBList</span></tt> object.
By default, the server of the Worldwide Protein Data Bank
(<tt class="docutils literal"><span class="pre">`ftp://ftp.wwpdb.org/pub/pdb/data/structures/divided/pdb/</span></tt> &lt;<a class="reference external" href="ftp://ftp.wwpdb.org/pub/pdb/data/structures/divided/pdb/">ftp://ftp.wwpdb.org/pub/pdb/data/structures/divided/pdb/</a>&gt;`__)
is used. See the API documentation for more details. Thanks again to
Kristian Rother for donating this module.</p>
</div>
<div class="section" id="downloading-the-entire-pdb">
<h3>11.8.2  Downloading the entire PDB<a class="headerlink" href="#downloading-the-entire-pdb" title="Permalink to this headline">¶</a></h3>
<p>The following commands will store all PDB files in the <tt class="docutils literal"><span class="pre">/data/pdb</span></tt>
directory:</p>
<p>The API method for this is called <tt class="docutils literal"><span class="pre">download_entire_pdb</span></tt>. Adding the
<tt class="docutils literal"><span class="pre">-d</span></tt> option will store all files in the same directory. Otherwise,
they are sorted into PDB-style subdirectories according to their PDB
ID’s. Depending on the traffic, a complete download will take 2-4 days.</p>
</div>
<div class="section" id="keeping-a-local-copy-of-the-pdb-up-to-date">
<h3>11.8.3  Keeping a local copy of the PDB up to date<a class="headerlink" href="#keeping-a-local-copy-of-the-pdb-up-to-date" title="Permalink to this headline">¶</a></h3>
<p>This can also be done using the <tt class="docutils literal"><span class="pre">PDBList</span></tt> object. One simply creates a
<tt class="docutils literal"><span class="pre">PDBList</span></tt> object (specifying the directory where the local copy of the
PDB is present) and calls the <tt class="docutils literal"><span class="pre">update_pdb</span></tt> method:</p>
<p>One can of course make a weekly <tt class="docutils literal"><span class="pre">cronjob</span></tt> out of this to keep the
local copy automatically up-to-date. The PDB ftp site can also be
specified (see API documentation).</p>
<p><tt class="docutils literal"><span class="pre">PDBList</span></tt> has some additional methods that can be of use. The
<tt class="docutils literal"><span class="pre">get_all_obsolete</span></tt> method can be used to get a list of all obsolete
PDB entries. The <tt class="docutils literal"><span class="pre">changed_this_week</span></tt> method can be used to obtain the
entries that were added, modified or obsoleted during the current week.
For more info on the possibilities of <tt class="docutils literal"><span class="pre">PDBList</span></tt>, see the API
documentation.</p>
</div>
</div>
<div class="section" id="general-questions">
<h2>11.9  General questions<a class="headerlink" href="#general-questions" title="Permalink to this headline">¶</a></h2>
<div class="section" id="how-well-tested-is-bio-pdb">
<h3>11.9.1  How well tested is Bio.PDB?<a class="headerlink" href="#how-well-tested-is-bio-pdb" title="Permalink to this headline">¶</a></h3>
<p>Pretty well, actually. Bio.PDB has been extensively tested on nearly
5500 structures from the PDB - all structures seemed to be parsed
correctly. More details can be found in the Bio.PDB Bioinformatics
article. Bio.PDB has been used/is being used in many research projects
as a reliable tool. In fact, I’m using Bio.PDB almost daily for research
purposes and continue working on improving it and adding new features.</p>
</div>
<div class="section" id="how-fast-is-it">
<h3>11.9.2  How fast is it?<a class="headerlink" href="#how-fast-is-it" title="Permalink to this headline">¶</a></h3>
<p>The <tt class="docutils literal"><span class="pre">PDBParser</span></tt> performance was tested on about 800 structures (each
belonging to a unique SCOP superfamily). This takes about 20 minutes, or
on average 1.5 seconds per structure. Parsing the structure of the large
ribosomal subunit (1FKK), which contains about 64000 atoms, takes 10
seconds on a 1000 MHz PC. In short: it’s more than fast enough for many
applications.</p>
</div>
<div class="section" id="is-there-support-for-molecular-graphics">
<h3>11.9.3  Is there support for molecular graphics?<a class="headerlink" href="#is-there-support-for-molecular-graphics" title="Permalink to this headline">¶</a></h3>
<p>Not directly, mostly since there are quite a few Python based/Python
aware solutions already, that can potentially be used with Bio.PDB. My
choice is Pymol, BTW (I’ve used this successfully with Bio.PDB, and
there will probably be specific PyMol modules in Bio.PDB soon/some day).
Python based/aware molecular graphics solutions include:</p>
<ul class="simple">
<li>PyMol:
<tt class="docutils literal"><span class="pre">`http://pymol.sourceforge.net/</span></tt> &lt;<a class="reference external" href="http://pymol.sourceforge.net/">http://pymol.sourceforge.net/</a>&gt;`__</li>
<li>Chimera:
<tt class="docutils literal"><span class="pre">`http://www.cgl.ucsf.edu/chimera/</span></tt> &lt;<a class="reference external" href="http://www.cgl.ucsf.edu/chimera/">http://www.cgl.ucsf.edu/chimera/</a>&gt;`__</li>
<li>PMV:
<tt class="docutils literal"><span class="pre">`http://www.scripps.edu/~sanner/python/</span></tt> &lt;<a class="reference external" href="http://www.scripps.edu/~sanner/python/">http://www.scripps.edu/~sanner/python/</a>&gt;`__</li>
<li>Coot:
<tt class="docutils literal"><span class="pre">`http://www.ysbl.york.ac.uk/~emsley/coot/</span></tt> &lt;<a class="reference external" href="http://www.ysbl.york.ac.uk/~emsley/coot/">http://www.ysbl.york.ac.uk/~emsley/coot/</a>&gt;`__</li>
<li>CCP4mg:
<tt class="docutils literal"><span class="pre">`http://www.ysbl.york.ac.uk/~lizp/molgraphics.html</span></tt> &lt;<a class="reference external" href="http://www.ysbl.york.ac.uk/~lizp/molgraphics.html">http://www.ysbl.york.ac.uk/~lizp/molgraphics.html</a>&gt;`__</li>
<li>mmLib:
<tt class="docutils literal"><span class="pre">`http://pymmlib.sourceforge.net/</span></tt> &lt;<a class="reference external" href="http://pymmlib.sourceforge.net/">http://pymmlib.sourceforge.net/</a>&gt;`__</li>
<li>VMD:
<tt class="docutils literal"><span class="pre">`http://www.ks.uiuc.edu/Research/vmd/</span></tt> &lt;<a class="reference external" href="http://www.ks.uiuc.edu/Research/vmd/">http://www.ks.uiuc.edu/Research/vmd/</a>&gt;`__</li>
<li>MMTK:
<tt class="docutils literal"><span class="pre">`http://starship.python.net/crew/hinsen/MMTK/</span></tt> &lt;<a class="reference external" href="http://starship.python.net/crew/hinsen/MMTK/">http://starship.python.net/crew/hinsen/MMTK/</a>&gt;`__</li>
</ul>
</div>
<div class="section" id="whos-using-bio-pdb">
<h3>11.9.4  Who’s using Bio.PDB?<a class="headerlink" href="#whos-using-bio-pdb" title="Permalink to this headline">¶</a></h3>
<p>Bio.PDB was used in the construction of DISEMBL, a web server that
predicts disordered regions in proteins
(<tt class="docutils literal"><span class="pre">`http://dis.embl.de/</span></tt> &lt;<a class="reference external" href="http://dis.embl.de/">http://dis.embl.de/</a>&gt;`__), and COLUMBA, a
website that provides annotated protein structures
(<tt class="docutils literal"><span class="pre">`http://www.columba-db.de/</span></tt> &lt;<a class="reference external" href="http://www.columba-db.de/">http://www.columba-db.de/</a>&gt;`__). Bio.PDB
has also been used to perform a large scale search for active sites
similarities between protein structures in the PDB
[<a class="reference external" href="#hamelryck2003b">19</a>, Hamelryck, 2003], and to develop a new
algorithm that identifies linear secondary structure elements
[<a class="reference external" href="#majumdar2005">26</a>, Majumdar <em>et al.</em>, 2005].</p>
<p>Judging from requests for features and information, Bio.PDB is also used
by several LPCs (Large Pharmaceutical Companies :-).</p>
</div>
</div>
</div>


          </div>
        </div>
      </div>
      <div class="sphinxsidebar">
        <div class="sphinxsidebarwrapper">
  <h3><a href="index.html">Table Of Contents</a></h3>
  <ul>
<li><a class="reference internal" href="#">Chapter 11  Going 3D: The PDB module</a><ul>
<li><a class="reference internal" href="#reading-and-writing-crystal-structure-files">11.1  Reading and writing crystal structure files</a><ul>
<li><a class="reference internal" href="#reading-a-pdb-file">11.1.1  Reading a PDB file</a></li>
<li><a class="reference internal" href="#reading-an-mmcif-file">11.1.2  Reading an mmCIF file</a></li>
<li><a class="reference internal" href="#reading-files-in-the-pdb-xml-format">11.1.3  Reading files in the PDB XML format</a></li>
<li><a class="reference internal" href="#writing-pdb-files">11.1.4  Writing PDB files</a></li>
</ul>
</li>
<li><a class="reference internal" href="#structure-representation">11.2  Structure representation</a><ul>
<li><a class="reference internal" href="#structure">11.2.1  Structure</a></li>
<li><a class="reference internal" href="#model">11.2.2  Model</a></li>
<li><a class="reference internal" href="#chain">11.2.3  Chain</a></li>
<li><a class="reference internal" href="#residue">11.2.4  Residue</a></li>
<li><a class="reference internal" href="#atom">11.2.5  Atom</a></li>
<li><a class="reference internal" href="#extracting-a-specific-atom-residue-chain-model-from-a-structure">11.2.6  Extracting a specific <tt class="docutils literal"><span class="pre">Atom/Residue/Chain/Model</span></tt> from a Structure</a></li>
</ul>
</li>
<li><a class="reference internal" href="#disorder">11.3  Disorder</a><ul>
<li><a class="reference internal" href="#general-approach">11.3.1  General approach</a></li>
<li><a class="reference internal" href="#disordered-atoms">11.3.2  Disordered atoms</a></li>
<li><a class="reference internal" href="#disordered-residues">11.3.3  Disordered residues</a><ul>
<li><a class="reference internal" href="#common-case">Common case</a></li>
<li><a class="reference internal" href="#point-mutations">Point mutations</a></li>
</ul>
</li>
</ul>
</li>
<li><a class="reference internal" href="#hetero-residues">11.4  Hetero residues</a><ul>
<li><a class="reference internal" href="#associated-problems">11.4.1  Associated problems</a></li>
<li><a class="reference internal" href="#water-residues">11.4.2  Water residues</a></li>
<li><a class="reference internal" href="#other-hetero-residues">11.4.3  Other hetero residues</a></li>
</ul>
</li>
<li><a class="reference internal" href="#navigating-through-a-structure-object">11.5  Navigating through a Structure object</a></li>
<li><a class="reference internal" href="#analyzing-structures">11.6  Analyzing structures</a><ul>
<li><a class="reference internal" href="#measuring-distances">11.6.1  Measuring distances</a></li>
<li><a class="reference internal" href="#measuring-angles">11.6.2  Measuring angles</a></li>
<li><a class="reference internal" href="#measuring-torsion-angles">11.6.3  Measuring torsion angles</a></li>
<li><a class="reference internal" href="#determining-atom-atom-contacts">11.6.4  Determining atom-atom contacts</a></li>
<li><a class="reference internal" href="#superimposing-two-structures">11.6.5  Superimposing two structures</a></li>
<li><a class="reference internal" href="#mapping-the-residues-of-two-related-structures-onto-each-other">11.6.6  Mapping the residues of two related structures onto each other</a></li>
<li><a class="reference internal" href="#calculating-the-half-sphere-exposure">11.6.7  Calculating the Half Sphere Exposure</a></li>
<li><a class="reference internal" href="#determining-the-secondary-structure">11.6.8  Determining the secondary structure</a></li>
<li><a class="reference internal" href="#calculating-the-residue-depth">11.6.9  Calculating the residue depth</a></li>
</ul>
</li>
<li><a class="reference internal" href="#common-problems-in-pdb-files">11.7  Common problems in PDB files</a><ul>
<li><a class="reference internal" href="#examples">11.7.1  Examples</a><ul>
<li><a class="reference internal" href="#duplicate-residues">11.7.1.1  Duplicate residues</a></li>
<li><a class="reference internal" href="#duplicate-atoms">11.7.1.2  Duplicate atoms</a></li>
</ul>
</li>
<li><a class="reference internal" href="#automatic-correction">11.7.2  Automatic correction</a><ul>
<li><a class="reference internal" href="#a-blank-altloc-for-a-disordered-atom">11.7.2.1  A blank altloc for a disordered atom</a></li>
<li><a class="reference internal" href="#broken-chains">11.7.2.2  Broken chains</a></li>
</ul>
</li>
<li><a class="reference internal" href="#fatal-errors">11.7.3  Fatal errors</a><ul>
<li><a class="reference internal" href="#id1">11.7.3.1  Duplicate residues</a></li>
<li><a class="reference internal" href="#id2">11.7.3.2  Duplicate atoms</a></li>
</ul>
</li>
</ul>
</li>
<li><a class="reference internal" href="#accessing-the-protein-data-bank">11.8  Accessing the Protein Data Bank</a><ul>
<li><a class="reference internal" href="#downloading-structures-from-the-protein-data-bank">11.8.1  Downloading structures from the Protein Data Bank</a></li>
<li><a class="reference internal" href="#downloading-the-entire-pdb">11.8.2  Downloading the entire PDB</a></li>
<li><a class="reference internal" href="#keeping-a-local-copy-of-the-pdb-up-to-date">11.8.3  Keeping a local copy of the PDB up to date</a></li>
</ul>
</li>
<li><a class="reference internal" href="#general-questions">11.9  General questions</a><ul>
<li><a class="reference internal" href="#how-well-tested-is-bio-pdb">11.9.1  How well tested is Bio.PDB?</a></li>
<li><a class="reference internal" href="#how-fast-is-it">11.9.2  How fast is it?</a></li>
<li><a class="reference internal" href="#is-there-support-for-molecular-graphics">11.9.3  Is there support for molecular graphics?</a></li>
<li><a class="reference internal" href="#whos-using-bio-pdb">11.9.4  Who’s using Bio.PDB?</a></li>
</ul>
</li>
</ul>
</li>
</ul>

  <h4>Previous topic</h4>
  <p class="topless"><a href="chr10.html"
                        title="previous chapter">Chapter 10  Swiss-Prot and ExPASy</a></p>
  <h4>Next topic</h4>
  <p class="topless"><a href="chr12.html"
                        title="next chapter">Chapter 12  Bio.PopGen: Population genetics</a></p>
  <h3>This Page</h3>
  <ul class="this-page-menu">
    <li><a href="_sources/chr11.txt"
           rel="nofollow">Show Source</a></li>
  </ul>
<div id="searchbox" style="display: none">
  <h3>Quick search</h3>
    <form class="search" action="search.html" method="get">
      <input type="text" name="q" />
      <input type="submit" value="Go" />
      <input type="hidden" name="check_keywords" value="yes" />
      <input type="hidden" name="area" value="default" />
    </form>
    <p class="searchtip" style="font-size: 90%">
    Enter search terms or a module, class or function name.
    </p>
</div>
<script type="text/javascript">$('#searchbox').show(0);</script>
        </div>
      </div>
      <div class="clearer"></div>
    </div>
    <div class="related">
      <h3>Navigation</h3>
      <ul>
        <li class="right" style="margin-right: 10px">
          <a href="genindex.html" title="General Index"
             >index</a></li>
        <li class="right" >
          <a href="chr12.html" title="Chapter 12 Bio.PopGen: Population genetics"
             >next</a> |</li>
        <li class="right" >
          <a href="chr10.html" title="Chapter 10 Swiss-Prot and ExPASy"
             >previous</a> |</li>
        <li><a href="index.html">Biopython_en 1.0 documentation</a> &raquo;</li> 
      </ul>
    </div>
    <div class="footer">
        &copy; Copyright 2013, Biopython.
      Created using <a href="http://sphinx-doc.org/">Sphinx</a> 1.2b1.
    </div>
  </body>
</html>